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We study and solve the problem of distilling secret key from quantum states representing correla- 
tion between two parties (Alice and Bob) and an eavesdropper (Eve) via one-way public discussion: 
we prove a coding theorem to achieve the "wire-tapper" bound, the difference of the mutual infor- 
mation Alice-Bob and that of Alice-Eve, for so-called cqq-correlations, via one-way public com- 
munication. This result yields information-theoretic formulas for the distillable secret key, giving 
"ultimate" key rate bounds if Eve is assumed to possess a purification of Alice and Bob's joint state. 

Specialising our protocol somewhat and making it coherent leads us to a protocol of entanglement 
distillation via one-way LOCC (local operations and classical communication) which is asymptot- 
ically optimal: in fact we prove the so-called "hashing inequality" which says that the coherent 
information (i.e., the negative conditional von Neumann entropy) is an achievable EPR rate. This 
result is well-known to imply a whole set of distillation and capacity formulas which we briefly 
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I. INTRODUCTION 

Entanglement and secret correlation share an "exclu- 
siveness" — in the one case towards the total outside 
world, in the other towards an entity "Eve" — that has 
led quantum information scientists to speculate on a sys- 
tematic relation between their theories: the works in 
this direction range from building analogies |14| to us- 
ing entanglement to prove information theoretic security 
of quantum key distribution |4^, to attempts to prove 
the equivalence of the distillability of secret key and of 
entanglement [Tl fTo| . 

Of course there are also conceptual differences: while 
the task of distilling secret perfect correlation derives 
from potential cryptographic applications (and requires a 
third, malicious, party to formulate the operational prob- 
lem) , entanglement is useful for simple transmission tasks 
between two perfectly cooperating parties, as exemplified 
by dense coding pj and teleportation |8|. 

The present paper falls into the third of the above cat- 
egories, for we address the two questions, of distilling 
secret key from many copies of a quantum state (itself 
a generalisation of classical information theoretic work 
begun by Maurer [3jJ and Ahlswede and Csiszar 0) by 
public discussion and of distilling EPR pairs by local op- 
erations and classical communication (LOCC), in a uni- 
fied way. To be more precise, after describing a proto- 
col for secret key distillation from a state by one-way 
public discussion, we show how secrecy codes of a par- 
ticular structure can be converted into one-way LOCC 
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entanglement distillation protocols achieving the coher- 
ent information, as was conjectured for some time under 
the name of the "hashing inequality" (after the hash- 
ing protocol in Q which attains the bound for Bell- 
diagonal two-qubit states). It is well-known from [3^ 
that this inequality yields information theoretic char- 
acterisations of distillable entanglement under general 
LOCC, as well as the quantum transmission capacity, 
without, with forward and with bidirectional classical 
side channel (the first of these capacity theorems proved 
recentl y b y Shor |4^, following a heuristic argument of 
Lloyd |33|, and subseqently in EU)- Our approach is 
very close to that of [l!j , and — as far as secret key dis- 
tillation is concerned — the work : while here our re- 
source is a three-party quantum state ("static" model), 
these papers deal with the "dynamic" analogue, where 
the resource is a quantum/ wiretap channel. 

As for the structure of the paper: the main result of the 
cryptographic part is theorem^n section^ the form of 
the optimal rates is then not hard to obtain, as we shall 
show in the detailed discussion. It is theorem ^ which 
we return to in the entanglement distillation part: a very 
general modification of the coding procedure will give us 
theorem ^3 the hashing inequality; and as before, the 
form of the optimal rates is not hard to get from there. 
A reader only interested in entanglement distillation can 
thus skip the second part of section [HI there the general 
form of optimal one-way distillable secret key is derived. 
In section ITTT1 we turn to one-way entanglement distilla- 
tion, proving the hashing inequality and exhibiting the 
general form of optimal one-way distillation; then in sec- 
tion IIVI the consequences of the hashing inequality are 
detailed. Appendices collect the necessary facts about 
typical subspaces (0), some miscellaneous lemmas |BJ) 
and miscellaneous proofs (jUjl . 
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II. ONE-WAY SECRET KEY DISTILLATION 

We will first study and solve the case of cqq- 
correlations, i.e., the initial state p ABE has the form 



ABE 



= Y, P (x)\x)(x\ A ®p E x 



BE 



(1) 



Then n copies of that state can be written 



ABE\® n 



IP 



P n (x n )\x n )(x n \ A ®p] 



BE 



with 



and 



\x n ) = \xi)®---®\x n ), 



p£? = P™ ■ 



Let X be a random variable with distribition P, and 
corresponding to the n copies of p consider independent 
identically distributed (i.i.d.) realisations A"i, . . . , X n of 
X. 

A one-way key distillation protocol consists of: 

• A channel T : x n — ► (£ t m), with range £ £ 
{1, . . .,L} and m € {1, . . -,M}. 

• A POVM £>W = (D$)£f =1 on S n for every t 

The idea is that Alice generates T(X n ) = (A, if); her 
version of the key is K = m, while she sends A = £ to 
Bob. He obtains his A"' by measuring his system i? using 

Pr{A" = m|A = I, X n = x n } = Tr(D$p B n ). 

For technical reasons we assume that the communication 
has a rate L < 2 nF , for some constant F. 
We call this an (n, e) -protocol if 



1. Pr{A ^ K'} < e. 



2. ||Dist(A) - ±1 



MHh-,M}\h 



< €. 



3. There is a state ctq such that for all m, 



^yr{X n =x n ,A=£\K=m 



< e. 



We call R an achievable rate if for all n there exist 
(n, e)-protocols with e — > and — log M — > i? as n — > oo. 
(The convention in this paper is that log and exp are 
understood to be to basis 2.) Finally define 

K^(p) := sup{i? : i? achievable}, 

the one-way (or forward) secret key capacity of p. 

Before we can formulate our first main result, we have 
to introduce some information notation: for a quantum 
state p we denote the von Neumann entropy H{p) = 



— Trp log p, and the Shannon entropy of a probability dis- 
tribution P, H(P) =-J2x P ( x ) l °S F (x). If the state is 
the reduced state of a multi-party state, like the p ABE 
above, we write H (A) = H{p A ), etc. In the particular 
case of eq. Q, obviously H(p A ) = H(P). For a general 
bipartite state p define the (quantum) mutual infor- 
mation 

I {A : B) = H(A) + H(B) - H(AB), 

which for the cq-state of eq. (JTJ is easily checked to be 
equal to 

H(p B )-Y,P(x)H(p B ), 

X 

a quantity known as the Holevo bound |2j| and which we 
denote I(P;p B ), reflecting in the notation the distribu- 
tion P and the cq-channel with channel states p B . We 
shall often use the abbreviation I(X; B) for this latter, 
if the states and distribution of the random variable X 
are implicitly clear: this latter notation has the advan- 
tage that for any U jointly distributed with X, I(U; B) 
makes sense immediately, without our having to write 
down a composite state. 

Finally, for a tripartite state p ABC , define the (quan- 
tum) conditional mutual information 

I {A : C\B) := H(AB) + H(BC) - H(ABC) - H(B), 

which is non-negative by strong subadditivity 36] . Usu- 
ally the state these notations refer too will be clear from 
the context; where not we add it in subscript. Observe 
that for a classically correlated system B, the conditional 
mutual information takes the form of a probability aver- 
age over mutual informations: e.g., for the state of eq. 0J, 

I(B:E\A) = J2P(x)I(B:E) p „. 

X 

Also for conditional mutual information we make use of 
the hybrid notation involving random variables: for ex- 
ample, for random variables T and U , jointly distributed 
with X, I(U ; B\T) is the average over T of Holevo quan- 
tities as above. 

Theorem 1 For every cqq-state p, 

K^(p)>I(X;B)-I(X;E). 

Proof. The idea is as follows: the state 

p AB =Yp(x)\x){x\ A ® p B 



contains the description of a cq-channel with channel 
states p x . We will cover "evenly" all typical type classes 
of block length n by channel codes Q to transmit » 
nI(X;B) bits, most of which are "good" in the sense 
that they have small error probability. All of them are 
of the kind that the state of E, when taking the average 
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over the last « nI(X;E) bits of the input, is almost a 
constant operator, ag, independent of the leading bits. 

The key distillation scheme works then as follows: on 
observing x n , which is typical with high probability, Alice 
announces the type of it and a random £, such that x n is 
a codeword of the code Ci, to Bob. He is able to decode 
it with high probability (because the code will be good 
with high probability), and they take the leading 

a n(I(X;B) - I(X;E)) bits 

of the message as the key. This is uniformly distributed 
because the code is entirely within one type class. Eve 
knows almost nothing about the key since she only has a 
state very close to ag, independent of the key. 



with the indicator functions l^tm*) on Tq. 

e-Secrecy: for all £, m, the average of p E n over Si 
{Uilms) : s = l, . . . , 5} i s close to cr(Q): 



o-(Q) 



< e. 



Codes Ci are e— good: define the code Ci as the col- 
lection of codewords ([/( £ms )) . We call it e-qood if 

there exists a POVM (D$ s ) r 



' m , s 

such that 



^VTr(p^ m8) Z?W)>l 



MS ^ 




Figure 1: A schematic view of the anatomy of the code: the 
typical sequences are covered by sets Ci , which are good trans- 
mission codes for B. A magnified view of one Ce (to the lower 
right) reveals its inner structure: it is composed of Si m , which 
are good privacy amplification codes against E. 

In precise detail: let Q be an n-type (Ultimately we 
will only be interested in typical Q, i.e. \\P — Q\\i < S.) 
Consider random variables U^ ms \ independent identi- 
cally distributed (i.i.d.) according to the uniform dis- 
tribution on the type class Tq (see appendix |A"|> . I = 
1, . . . , L, m = 1, . . . , M, s — 1, . . . , S. Let 



<?(Q) 



l 



• E 



Px 



We are interested in the probability of various random 
events (for < e < 1/2): 

e— Evenness: for all x n € Tq, 

,LMS ^ , „, . .LMS 



\TA 



Using the Chernoff bound for the indicator functions 
l(7(f m s) evaluated at all points in Tq (lemma |21 and fol- 
lowing remarks), we obtain 



Pr{e-evenness} > 1 - \X\ n exp -LMS 



21n2|T»| 



(2) 



Proposition^] gives us (observing MS < \X\ n ), for every 
6 > and sufficiently large n, 



Pr{e-secrecy} > 1 - 2d n \X\ n exp (-Si 



288 In 2 



(3) 



with log i = -I(Q; p E ) - 5. 

Finally, proposition [3] yields for every 5 > and if 
MS < exp(n(/(Q; p B ) - 6)) (n sufficiently large), 



W Pr{Ci e-good} > 1 



(4) 



Since the individual events in this equation are indepen- 
dent, another application of the Chernoff bound (to the 
indiator function of "e-goodness" ) gives, 



Pr{A fraction 1 — 2e of the Ci is e-good} 



> 1 — exp 
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(5) 



Thus, if we pick 

S = cxp[n(l{Q;p E ) + 2S)], 
M = exp[n(/(Q; p B ) - I(Q; p B ) - 36)] , 
L = cxp[n(H{Q) - I{Q; p B ) + 26)] , 

and observing \Tq \ < exp(nH(Q)), the right hand sides 
of eqs. J3J, © an d © converge to 1 as n — > oo, and 
hence by the union bound alse the conjunction of these 
three events approaches unit probability asymptotically. 
Thus, for sufficiently large n, there exist codewords 



{Iras 



' € Tq which together have the property of e- 
evenness, e-secrecy and that a fraction of at least 1 — 2e 
of the Ci = (u^ ms ') m , s is e-good. Clearly, we can con- 
struct such code sets for all types Q, of which there are 
at most (n + 1)1*1 many. 



4 



Now, the protocol works as follows: on observing x n 
from the source, Alice determines its type Q and sends 
it to Bob. If x n is not typical, i.e. if \\P — Q\\i > 5, the 
protocol aborts here. Otherwise she selects a random I 
such that x n is a codeword of Cg, as well as random to, s 
such that tt^ ms ) = x n . (The latter choice of course is 
unique most of the time: if Ci is a good code, only a 
fraction of at most e of the codewords have a collision.) 
She informs Bob also of £; if C; is not e-good the protocol 
aborts. 

Note that by the e-evenness of the codewords, the state 
of ABE conditional on Q and £ is 



1 



MS ^ 



^<l±e)\ms)(ms\ A ®p B ( f ms) . 



(6) 



(By way of notation, "1 ± e" stands for any number in 
the interval [1 — e; 1 + e].) Now, since Ci is a good code, 
Bob can apply the decoding POVM to his part of 
the system, and transform the state in eq. © into a state 
9 with the property 



— \ms)(ms\ A <g> \ms)(ms\ B <gi p E (ims) 



< 2e. 



Both Alice and Bob measure to and end up with a per- 
fectly uniformly distributed key of length 

n(l(Q;p B )-I(Q;p E )-3S) 

>n(l(P;p B )-I(P;p E )-3S-S'), 

with probability 1 — 3e, where 

5' = 25\og{d A d B d E ) + 2T(S), 

with the dimensions ds and d E of Bob's and Eve's local 
system, respectively. (Recall that Q is typical, and use 
the Fannes inequality stated in appendix[B]as lemma [T7l ) 
By the above property of 0, Alice and Bob disagree with 
probability < e. 

Finally, thanks to the e-secrecy, for all I and to, 



<r(Q) 



<e, 



so Eve's state after the protocol (including her knowledge 
of Q ond £) is almost constant, whatever the value of to. 
□ 

Remark 2 The communication cost of the protocol de- 
scribed in the above proof is asymptotically 

H(X)-I(X;B) = H(A\B) 

bits of forward communication (per copy of the state): the 
information which code Ci to apply from Alice to Bob. 

Here are the facts we use in the proof: 



Lemma 3 ("Operator Chernoff bound" [3]) Let 

Xi,...,Xm be i.i.d. random variables taking values 
in the operators B(Ti) on the D -dimensional Hilbert 
space H., < Xj < 1, with A = MXj > at, and let 
< r) < 1/2. Then 



< 2D exp 



-M 



arj 
2 In 2 



where [A; B] = {X : A < X < B} is an interval in the 
operator order. □ 

Note that for the case D = 1 this reduces to the 
classical Chernoff bound for bounded real random vari- 
ables ^3 • Also the case of finite vectors of bounded real 
random variables is included by considering the matrices 
with vector entries on the diagonal and zero elsewhere. 
It is essential in the proof of the following result. 

Proposition 4 For a cq-channel W : X — > S(7i) and 

a type P, let U^' be i.i.d. according to the uniform dis- 
tribution on the type class Tp , j = 1, . . . , M . Define the 
state 



a(P) 



1 



J2 w&=mw5 a) . 



Then for every e, S > 0, and sufficiently large n, 
M 

M 



Pr 



j'=i 



> e 



L^W^-aiP) 

< 2d n exp (-Mi n 

*V 288 In 2 



with logt = -I(P;W) - 5. 

Proof. The proof is very close to that of the compression 
theorem for POVMs [54(. We reproduce a version of the 
argument in appendix El C 

Proposition 5 (HSW theorem) Consider a cq- 
channel W : X — ► S(H) and a type P, and let t/W be 
i.i.d. according to the uniform distribution on the type 
class Tp , i — 1,...,N. Then for every e, S > and 
sufficiently large n, iflogN < n(l(P; W) — 5) , 



Pr 



[C = (U {t) )fLi «s e-good} 



> 1 



Here we call a collection of codewords e-good if there 
exists a POVM (A)iLi on H® n such that 

1 N 

_£Tr(W£ w A)>l-e. 
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Proof. This really is only a slight modification of the 

Holevo-Schumacher- Westmoreland argument [30l 

we give the proof in appendix [0 □ 

This coding theorem puts us in the position to prove 
the following formula for the one-way secret key distilla- 
tion capacity of a cqq-state: 

For conditional distributions Q(u\x) and R(t\u) define 
the states 

Theorem 6 For very cqq-state p, 



K^(p) = lim -K {1) 



with 



(p) = max [I(U; B\T) - I(U; E\T)] , 

1 \U \a 

where the maximisation runs over all random variables 
U depending on X and T depending on U , i.e. there are 
channels Q and R such that U = Q(X) and T — R(U), 
and the above formula refers to the state 



TUABE 



R(t\u)Q(u\x)P{x) 



BE 



\t)(t\ x <g> \u)(u\ u ® \x){x\ A ®p* 

The ranges of U and T may be taken to have cardinalities 
\T\ < \X\ and \U\ < \X\ 2 , and furthermore T can be 
taken a (deterministic) function of U . 

Proof. Let us begin with the converse part, i.e. the in- 
equality "<" : Consider an (n, e)-protocol with rate R; 
then by its definition, and using standard information 
inequalities and the Fanncs inequality lemma IT7I 

nR < H{K)+n(r{e)+eR) 

< I(K : K'X) + n(2r(e) + eR + eF) 

< I(K; BA) + n(2r(e) + eR + eF) 

< I{K;BA) - I(K;EA) 

+ n(3r(e) + eR + 2eF + elogd E ) 
= I(K; B\A) - I(K; E\A) + nS 

Letting U = {K, A) and T = A we obtain 

R< -K ( - 1 \p) + S, 
n 

with arbitrarily small S as n — * oo. 

The proof of the properties of U and T is given in 
appendix El 

Now we come to the proof of the direct part, i.e. the 
inequality ">" : it is clearly sufficient to show that, for 
given U and T, the rate R = I{U;B\T) - I{U;E\T) is 
achievable. To this end, consider a protocol, where Alice 
generates U and T for each copy of the state i.i.d., and 



broadcasts T: this leaves Alice, Bob and Eve in n copies 
of 

p= R(t\u)Q(u\x)P(x)\u)(u\ A (g,p^ E (g,\t)(t\ B '(g,\t)(t\ E [ 



Observing 

R = I(U;BB') - I{U;EE'), 

we can invoke theorem^ and are done. □ 

Remark 7 Comparing this with the classical analogue 
in Q/ ; it is a slight disappointment to see that here we 
don't get a single-letter formula. The reader may want 
to verify that the technique used there to single-letterise 
the upper bound does not work here, as it introduces con- 
ditioning on quantum registers, while our T has to be 
classical. 



P 



One can clearly also use a general three-party state 
\be £ g enera t e secret key between Alice and Bob: 
a particular strategy certainly is for Alice to perform 
a quantum measurement described by the POVM Q — 
{Qx)xgx, which leads to the state 



P 



Then, starting from many copies of the original state 
p, we now have many copies of p, and theorem H3 can be 
applied. Because we can absorb the channel U\ X into the 
POVM, we obtain the direct part (">") in the following 
statement: 

Theorem 8 For every state p ABE ' , 

K^(p) = lim K^(p® n ), 

n — >oo 

with 

KW(j>)= max I(X;B\T) -I(X;E\T), 

Q,T\X 

where the maximisation is over all POVMs Q = {Q x )xex 
and channels R such that T — R(X), and the informa- 
tion quantities refer to the state 



TA'BE 



J2R(t\x)P(x) 



\t)(t\ r ® \x)(x\ A ' ® Tr A (p ABE (Q x ® 1 BE )). 

The range of the measurement Q and the random vari- 
able T may assumed to be bounded as follows: |T| < d 2 A 
and \X\ < d\, and furthermore T can be taken a (deter- 
ministic ) function of X . 

Proof. After our remarks preceding the statement of the 
theorem, we have only the converse to prove. This will 
look very similar to the converse of theorem Even 
though we haven't so far defined what a key distillation 
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protocol is in the present context, we can easily do that 
now (and check that the procedure above is of this type): 
it consists of a measurement POVM Q = (Qem)t' m =i f° r 
Alice and the POVMs for Bob, with the same con- 
ditions (l)-(3) as in the first paragraphs of this section, 
where as before we assume a rate bound on the public 
discussion: L < 2 nF . This obviously generalises the def- 
inition we gave for cqq-states. 

Consider an (n, e)-protocol with rate R; using stan- 
dard information inequalities and the Fannes inequality 
lemma II 71 we can estimate as follows: 

nR < H(K) + n(r(e) + eR) 

< I(K : K'A) + n(2r(e) + eR + eF) 

< I(K; BA) + n(2r(e) + eR + eF) 

< I(K;BA) - I{K-EA) 

+ n(3r(e) + eR + 2eF + elogd E ) 
= I(K; B\A) - I(K; E\A) + nS 

The measurement Q and T(£, m) = I are permissible in 
the definition of hence we obtain 

n 

with arbitrarily small <5 as n — > oo. 

It remains to prove the bounds on the range of X and 
T for which we imitate the proof of the corresponding 
statement in theorem the full argument is given in 
appendix O □ 

Remark 9 Clearly the worst case for Alice and Bob is 
when Eve holds the system E of a purification \ip ABE ) 
of p AB , because clearly every other extension p of 
p AB can be obtained from the purification by a quantum 
operation acting on E. 

Our result (at least in principle) characterises those 
bipartite states p AB for which one-way key distillation 
is possible at positive rate. We have to leave open the 
question of characterising the states for which positive 
rates can be obtained by general two-way pub lic discus- 
sion (compare the classical case fl. \3A . Od . 

Note that the classical analogue of the "worst case " is 
total knowledge of Eve about both Alice's and Bob's ran- 
dom variables — which makes key distillation totally im- 
possible. For quantum states thus, it must be some "non- 
classical" correlation which makes positive rates possible; 
it is tempting to speculate that a manifestation of entan- 
glement is behind this effect. 

We do not fully resolve this issue in the present paper; 
nevertheless, in a similar vein, we show in the following 
section that if p AB allows one-way distillation of EPR 
pairs at positive rates, then our cryptographic techniques 
give a construction of an entanglement distillation pro- 
tocol by a modification of key distillation protocols of a 
particular form. 



III. ONE-WAY ENTANGLEMENT 
DISTILLATION 

Consider an arbitrary state p AB between Alice and 
Bob. In [|| the task of distilling EPR pairs at optimal 
rate from many copies of p, via local operations and clas- 
sical communication (LOCC), was introduced. 

A one-way entanglement distillation protocol consists 

of 

• A quantum instrument T = (Te)\ = \ for Alice. (An 
instrument [Tsj is a quantum operation with both 
classical and quantum outputs — it is modelled 
in general as a cp-map valued measure; for our 
purposes it is a finite collection of cp-maps which 
sum to a cptp map.) 

• For each t a quantum operation Re for Bob. 

We call it an (n, e) -protocol if it acts on n copies of the 
state p and produces a maximally entangled state 

1 M 

V rn—l 

up to fidelity 1 — e: 

Note that we may assume without loss of generality that 
Tg and Re output states supported on the reduced states 
of <&m on Alice's and Bob's system, respectively: other- 
wise we could improve the fidelity. 

A number R is an achievable rate if there exist, for 
every n, (n, e)-protocol, with e — * and i logM — > R as 
n — ► oo. Finally, 

D^(p) := sup{i? : R achievable} 

is the one way (or forward) entanglement capacity of p. 
In the case of Bell-diagonal two-qubit states, 

p = p o<Z> + +Poi$~ +Pw^ + +pn^~, 

was considered and proved that D^(p) > 1 — H({p}), by 
a method called "hashing protocol" (this was generalised 
recently to higher dimensions in Concerning lower 

bounds not much more is known, but there are numerous 
works dealing with upper bounds on the distillablc entan- 
glement: the entanglement of formation Ew (p) the 
relative entropy of entanglement E Te (p) [5l|, the Rains 
bound R(p) [42], and the recently introduced squashed 
entanglement E sq (p) [l3|. 

To connect to the cryptographic setting discussed so 
far, construct a purification \i/j) ABE of p, of which we are 
particularly interested in its Schmidt form 

\i>) ABE = j2VW)\x) A ®m BE - 

X 
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Consider the following special strategy for a one-way 
secret key distillation protocol, on n copies of the state: 
Alice measures x n (i.e. the above Schmidt basis), and ap- 
plies the secret key distillation protocol from theorem ^ 
it is easy to evaluate the key rate: 

I(P; iP B ) - I(P; = H(B) - H(E) 

= H(B) - H(AB). 

By letting Alice and Bob execute this protocol "coher- 
ently" , we can prove: 

Theorem 10 (Hashing inequality) 

D^(p)>H(p B )-H(p AB ). 

The right hand side here equals the negative conditional 
von Neumann entrop y, — H(A\B), a quantity known as 
coherent information \AA 1. which we denote (acknowledg- 
ing is directionality) I C (A)B). If the state this is refer- 
ring to is not apparent from the context, we add it in 
subscript: I c (A)B) p . 

Proof. Recall the structure of the protocol in the proof of 
theorem H f° r each typical type Q we have a collection 
of codewords u^ lms ^ , I = 1, . . . , L, m — 1, . . . , M and s — 
1, . . . , S from Tq satisfying e-evenness, e-secrecy, and a 
fraction of at least 1 — 2e of the codes Cp — (u^™ 1 ^) 
are e-good. 

The first step of the protocol is that Alice measures 
the type Q non-destructively and informes Bob about 
the result. The protocol aborts if Q is not typical, i.e. if 
ll-P ~~ Qlli > This leaves the post-measurement state 



(The absence of the 1 ± e factors when compared to the 
analogous eq. © in the proof of theorem^) is due to our 
having introduced the error event 0.) 

Now, just as in the proof of theorem ^ Bob decodes 
m and s, at least if Cg is e-good (which fails to happen 
with probability only 2e). But he does it coherently, by 
introducing an ancilla system B' in a standard state |0) 
and applying a unitary to extract ms into B, leaving in 
B' whatever is necessary to make the map unitary. This 
transforms the state in eq. JJJ) into a state 



i ABB' E 



ms 

| m „\B|,„OK 



B'E 



1 bnd\ BB ' E 



where e ms is the probability of the code incorrectly iden- 
tifying ms, and l^ms) is orthogonal to |77is)|^^.). Now, 
because the code is e-good, 




V* \ms) A 

MS t— 1 1 

ms 



I \S, K\ B ' E 
\ mS ) Wlms) 



>l-2V~e, 

where we have used the Markov inequality: at most a 
fraction of -y/e of the e ms can be larger than ^fe. Since 
the decoding only affects Bob's registers, but certainly 
not E, we have 



\% 



l - £ \x") A ®\A 



i BE 



(1 - e ms ) (<p?£ 3 ) +e ms (^ d s ) 
and hence we can assume that 



Define now a quantum operation for Alice, with Kraus 
elements 



C,= 



1 Y\m S )(v«>""> 
1 + e LAIS ^ 1 A 



This implies 




(8) 



ms) A ® \ms) B \%jj u{tms )) B E 



(9) 



> 1 - 3^. 



which we interpret as an instrument with outcomes I and 
[l^: TAa) = C t aC\. This outcome is communicated to 
Bob. That these are really permissible Kraus operators 
we obtain from the e-evenness condition. 

The outcome (resulting in abortion of the protocol) is 
observed with probability at most e (if n is large enough) . 
The other outcomes I all occur with the probability 



7 (Q) = P® n {TS) 



1 1 



1 + eL' 



in which case the output state of the instrument is 



At this point, Alice and Bob almost have their maximal 
entanglement of the m-variable. All that remains is to 
be done is to disentangle Eve: 

To begin, Alice measures the s-component of her reg- 
ister in the Fourier-transformed basis: 



i £ >=v!^ e2 ™ t/S|s> 

v s=l 



t = l, 



,,S 



and tells Bob the result t, who applies the phase shift 



— r 

MS ^ 

ms 



i BE 



(7) 



Ye^ S \s)(s\ 



8 



to the s-component of his register B. This transforms 
\-d) ABB E into a state \Q} ABB E with 



F 16), 



— y 

MS ^ 



m) <g>|ms) 



B |^) S ' S )>l-3^, 



( 10 ) 

invoking the non-decrease of the fidelity under quantum 
operations, applied to eq. ©■ 

Absorbing s into the register B' , the right hand state 
in the last equation can be rewritten as 



with 



I* 



B'E 



m) <g> \m) \ipims) 



B'E 



(11) 



B'E 



The reduced states of Eve of the \i/)e ms 



B'E 



is 



0~tn 



where we made use of eq. (JSJ, which is, by the e-secrecy, 
at trace distance at most e from a state we denoted o(ff) 
in the proof of theorem ^ By lemma ITSl in appendix iBl 
F(at m ,a(Q)) > 1-e. 

Choosing a purification \() B E of a(Q), this means that 
there arc unitarics Ui m on B' such that 



F 



i III 



),|C) >i 



because the mixed state fidelity equals the maximum 
pure state fidelity over all purifications of the states and 
all purifications are related by unitaries on the purifying 
system [3^,|5(]]. Hence, if Bob applies 



U 



E 



m)(m\ <g> Ut v 



to his share of the state, then the state in eq. is 
transformed into a state YE}i ABB E with 



F 



\m) A ® \m) B \C,) B E 



> 1 



Of course, he actually works on |0), so they end up with 
the state (l<g>E/<g>l)|0), which has fidelity 1-3^ to |H), 
hence with eq. I|1U|I we conclude (by simple geometry) 

that it has fidelity > 1 - 12y/e to \<S> M ) AB ® \() B ' E - 

Nontypical Q, the event or bad code Ci happen with 
total probability at most 4e. In the "good" case, Alice 
and Bob distill — up to fidelity 1 — — a maximally 
entangled state of log Schmidt rank 

n{l{Q;^ B )-I{Q^ E )-38) 

> n(l(P; ij; B ) - I(P; ip E ) -38- 5') 

= n(H{B) - H(E) - 38 - 8'). 

with 8' just as at the end of the proof of theorem ^ n 



Remark 11 The communication cost of the above pro- 
tocol is asymptotically 



H(A)-I(X;B) 



I{X- E) = H(A) - H(B) + H{E) 
= H{A)+H{E) - H(AE) 
= I{A : E) 



bits of forward classical communication per copy of 
the state: the information which code C/> to use, plus 
the information from the measurement of the Fourier- 
transformed basis (\t))t- 

Even though at first sight there seems to be little reason 
to believe that our procedure is optimal for this resource 
( consider for example a separable initial state: Alice will 
have mutual information with a purification but clearly 
the best thing is to do nothing), it is amusing to see the 
quantum mutual information show up here. 

It is in fact possible to show that subject to another 
optimisation, the quantum mutual information between 
Alice and Eve gives indeed the minimum forward com- 
munication cost WA l. 

Example 12 It is interesting to compare our method 
to the original hashing protocol of [6j, for the case of 
mixtures of Bell states 



i,j=0 



with the numbering of the Bell states introduced in [(| : 

$oo = $ + , $oi = $~, $io = $ + , $n = 
The purification we use in the proof reads 



, ABE 



= J2 v^7i* 

i,j=0 

= ^(|0)W 



AB _ I • -\E 



\±) A \^) BB ) 



with 



IV>] 



BE 



i BE 



/m|i) s |n) s , 



«)>o) £ + vpu7|o) b |oi) £ 

+ ^\l) B \\Q) E 

fpV \i) B m E - v^T|i) B |oi) s 
+ VpuT|o) b |io) £ -v?iT|o) s |ii) s . 



Note that this is indeed a Schmidt decomposition. First 
of all, the communication cost of our protocol evaluates 
(using the symmetry between A and B) to 

I (A : E) = I(B : E) 

= H(B) +H(E) - H{BE) 
= l + H({p})-l = H({p}), 

which is the same as in ;6j. But the way of the hash- 
ing protocol is to "hash" information about the iden- 
tity of the state in the Bell ensemble into approximately 
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« nH({p}) of the states, which then are measured locally 
and the results communicated. Our protocol in contrast 
has two very distinct communication parts: there is the 
"code information" (which amounts to error correction 
between Alice and Bob, with built-in privacy amplifica- 
tion for Eve's information about the basis state), and 
there is the "phase information" from the measurement 
in the Fourier transformed basis. The first amounts to 

H(X) - I{X-^ B ) = H( P00 +Poi,Pio + Pn), 

while the second is 

I(X;E) = H{{p}) - H(p Q+pQi,p 1Q +p n ). 

Our result leads to the general formula for one-way 
distillable entanglement: 

Theorem 13 For any bipartite state p AB , 



D_(p) = lim -£>W (p® n ) , 



n — >oo fi 



with 



D^(p) :=max^A,/ c (A)i?) Pf , 



where the maximisation is over quantum instruments 
T = (Ti, ... ,Tl) on Alice's system, \( = TrT((p A ) and 
Pi~Yl (Ti (g> id) (p) . The range of i can be assumed to be 
bounded, L < d\, and moreover each Tg can be assumed 
to have only one Kraus operator: Te(o~) = A^aAl. 

Proof. First, for the direct part, it is sufficient to consider 
an instrument T on one copy of the state: if Alice per- 
forms the instrument T on each copy and communicates 
the result to Bob, they end up with the new state 

AB n 



Observe that 



I c {A)BB'y p = J2 x ?MA)B) pe , 

i 

thus application of theorem to p gives achievability. 

For the converse, consider any one-way distillation 
protocol with rate R, and denote Alice's instrument by 
T = (T()(. Bob's quantum operations by Rg. Write 

n = £(7> ® R t )(p® n ) =: J2 
t t 

Then, using Fannes inequality lemma IT71 the convexity 
of the coherent information in the state |3(t| | and quantum 
data processing |45| . 

nR < H(n B ) - H{fl AB ) + 2n(r(e) + eR) 
= I c (A)B) n +2n(T(e) + eR) 

< J2 XeI c {A)B)n e + 2»(r(e) + eR) 

t 

< Wc(A)B) ut + 2n(r(e) + eR), 



where uig = j-(Tg £g> id)(p® ra ). Hence we get 

R< -D^(p^ n )+S', 
n 

with arbitrarily small 5' as n — > oo, and we are done. 

As for the bound on L and the structure of T, observe 
that if one Tg has more than one Kraus element, one 
can decompose Tg(a) into a sum of terms Ay a At j: for 
the corresponding probabilities = JV Xgj and for the 
post-measurement states Xipg = JV ^tjPej- Then by the 
convexity of I c in the state |3r| . 



X e I c (A)B) Pt <J2 X M A ) B ) 



By the polar decomposition and invariance of I c under 
local unitaries we may further assume that Ag > 0, i.e. 

At = \ A\Ak in this form the whole instrument is ac- 



tually described by the POVM (Af) , and each POVM 
corresponds to an instrument by taking as the Ag the 
square roots of the POVM operators. 

Now, invoking a theorem of Davies 0] (which actu- 
ally is another application of Caratheodory's theorem, 
lemma EOJ, any POVM is a convex combination of ex- 
tremal POVMs, which have at most d\ non-zero ele- 
ments each, and this convex decomposition clearly car- 
ries over to the instruments: T = yV, 7r,-T,-. Since then 
Y] ( XgI c (A)B) Pe is the same convex combination of sim- 
ilar such terms for the instruments Tj, at least one of 
these gives a higher yield ^p\e,jI c (A)B) pi : note that 
the cp-maps of Tj are scalar multiples of the Tg, hence 
the output state of Tj with classical result £ is pg. □ 



IV. QUANTUM AND ENTANGLEMENT 
CAPACITIES 

Horodecki 3 |3^| have observed that the hashing in- 
equality implies information theoretic formulas for a 
number of quantum capacities and the distillable entan- 
glement: 

In particular, the quantum capacity of a quantum 
channel, either unassisted or assisted by forward or two- 
way communication is given by a formula involving coher- 
ent information (where we indicate the assisting resource 
in the subscript): 

Theorem 14 Let T : B(Ha) — ► B(Hb) be any quan- 
tum channel. Then, 

Q (T) = Q^(T)= lim - max I c {A')B n ) u , 

n— >oo n \ip) 

with any pure state if) on A'A n and the state 
lo = (id®T)® n (iP A ' A ). 
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Furthermore, 



and we are done. 



□ 



Q»(T) = lim - sup I c (A')B n )^, 

with any pure state ip on A'A n , two-way LOCC operation 
V and the state 



uj = V 



(\d®T)® n {^ A ' A ) 



Proof. See |33j ]. That forward communication does not 
help was proved in |j| , and that the right hand side is an 
upper bound to Q% was shown in I3HI45I]. 

The idea of achievability is to distill the state uj and 
then use teleportation — this involves forward communi- 
cation but either it is free or the whole procedure includ- 
ing the distillation and teleportation uses only forward 
communication, which by |5| can be removed. 

In [3^ | a similar formula (involving a coherent informa- 
tion I c (B)A) a ) was proposed for the quantum capacity 
with classical feedback. However, the proof as indicated 
above does not work in this case: indeed we may use 
the back-communication to help distillation, but tele- 
portation needs a forward communication, so we end up 
with a quantum channel code utilising two-way classi- 
cal side communication, which is not known to be re- 
ducible to just back-communication: in fact, the results 
of Bowen 9] might be taken as indication that for the 
erasure channel the capacities with feedback and with 
two-way side communication are different. □ 

We have already given a formula for the distillable en- 
tanglement using one way LOCC in theorem 1131 



Theorem 15 For any state p 



1 



AB 



D(p) = lim -sup/^A')^')^, 

Tl— >oo n v 

with any two-way LOCC operation V and the coherent 
information refers to the state u = V(p® n ). 

Proof. For the direct part (">") it is obviously sufficient 
to consider any two-way LOCC operation V on the bi- 
partite system, which applied to p gives a state a. Doing 
that for n copies of p, application of theorem shows 
that we can distill EPR pairs at rate I c {A')B') a from 
this. 

Conversely, let Vo be a two-way LOCC producing a 
state fl with ||fi — <J>m 111 < e, nR — log A/. Without loss 
of generality we may assume that Q is supported within 
the tensor product of the supports of the reduced states 
of $m- Thus, 



%R < H(n B ) - H(n AB ) + 3n(eR + r(e)) 
= I c {A)B) n + 3n(eR + T{e)) 

v{p® n ) '■ V two-way LOCC} 
+ 3n(ei? + r(e)), 



It was shown furthermore in [33l ] that for an ensemble 
{.Pit Pi} °f bipartite pure states the hashing inequality 
implies 

AD :=J2piD(Pi) ~ D(p) 

i 

< AJ := H(p) -J2PiH(pi). 



This inequality was first exhibited in [23| for a class of 
examples, and conjectured to be true in general. Note 
that the inequality is trivially true (using only concavity 
of the entropy) for the loss of coherent information on 
the left hand side. 

History and relation to other work: 

The coherent information made its appearance in [i^ 
where its relation to quantum channel capacity was con- 
jectured and many of its properties proved. Indepen- 
dently |3^ | proposed this quantity and a heuristic for a 
proof which however fell short of a proof. Only recently 
Hamada (2(| succeeded in giving a lower bound on quan- 
tum channel capacity in terms of coherent information 
- still with a crucial restriction to stabiliser codes. It 
took until f° r a full proof to be found — but then 
quite quickly one of us [l9| discovered a proof based on 
private information transmission, an idea inspired by the 
work of Schumacher and Westmoreland 01 • 

Regarding entanglement distillation, the hashing in- 
equality appears to have been a folk conjecture from the 
publication of 6] on, which however has received much 
less attention than the quantum channel coding problem. 
It was codified as an important conjecture in |33| . 

While completing the writing of the present paper we 
learned of the work [34|], in which it is shown that the 
proof by random coding of the channel capacity theorem 
can be used to obtin the hashing inequality. It may be in- 
teresting to compare the proofs |3J,|48j for the achievabil- 
ity of the coherent information to ours and ^J- While 
we, on the face of it, take a detour via secret key distil- 
lation, the final procedure can be argued more direct: in 
particular, we don't require the "double blocking" which 
in the other approaches seem necessary to reduce to a sit- 
uation in which Alice's end is in a maximally mixed state. 
Thus, presumably, our codes achieve rates approaching 
the coherent information more quickly, i.e. for smaller 
block length. 



V. CONCLUSION 

Our findings not only transport an existing classi- 
cal theory of distilling secret key frornprior correlation 
(Maurer [3£j, Ahlswede and Csiszar [J, and follow-up 
work) to the quantum case, but also link this subject to 
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entanglement distillation in an operational way: a co- 
herent implementation of the basic secret key distilla- 
tion protocol yielded an entanglement distillation proto- 
col achieving the coherent information — this then im- 
plies information theoretic formulas for distillable entan- 
glement and quantum capacities. 

We want to draw the reader's attention to several open 
question that we have to leave: first of all, are there 
states for which D^(p) < K^(p)l Are there maybe even 
bound entangled states with positive key rate? A first 
step might be to find states such that D^(p) < K^{p). 
Note that the potential gap between and £)_> comes 
from the possibility to have more general measurements 
at Alice's side than the complete von Neumann measure- 
ment in the Schmidt basis that was our starting point 
in the proof of theorem I1UI (actually any complete von 
Neumann measurement would do): namely, in key dis- 
tillation, a viable option for Alice is to discard part of 
her state (corresponding to using higher rank POVM el- 
ements), but keep that part secret from Eve all the same; 
while in entanglement distillation, 'Eve" is everything ex- 
cept Alice and Bob, so it is as if she would get the parts 
Alice decided to toss away. 

A second group of open questions: in general, the op- 
timisations in theorems IB1 IS1 and 1 1 31 are quite nasty, most 
so because they involve a limit of many copies of the 
state. In the classical theory of secret key distillation, a 
single-letter formula for the optimal one-way key rate is 
proved in 0, so there might be hope at least for theo- 
rem|Hl In contrast, the optimal rate of two-way protocols 
or even a procedure to decide if it is nonzero is still to be 
found (see the very well-informed reviews |3l| and |32|L 
which is why we concentrate on one-way protocols for 
now. It is known that distillability of entanglement may 
be absent for a single copy of a state, but could appear 
for collective operations on several copies (see again the 
review [32j, sections 6.3 and 7.2 and references therein), 
so there are only limited possibilities for making theo- 
rem 1131 into a single-letter formula. Note in particular 
that the results of — see also the discussion of Bar- 
num, Nielsen and Schumacher where the failure of 
subadditivity for the coherent information is observed — 
imply that single-letter maximisation of the coherent in- 
formation will certainly not achieve the optimum distill- 
ability. It would therefore be good to have at least an a 
priori bound on the number n of copies of the state which 
we have to consider to have £>( 1 '(/?® n ) within, say e, of 
the optimal rate. In general, good single-letter lower and 
upper bounds ^3 are still wanted! 

Finally, we would like to know what the pub- 
lic/classical communication cost is of distilling secret key 
and entanglement, respectively, in particular in the one- 
way scenario (which at any rate seems to be the one 
open to analysis). More generally, if we limit the amount 
of communication, can we determine the optimal dis- 
tillation rates (see [2lJ for the communication cost of 
common randomness distillation)? In the entanglement 
case this should link up with initial efforts to understand 



the communication cost of various state transformation 
tasks 0, H3, IH, Hl|. A study concerning the forward 
communication cost of entanglement distillation is in per- 
paration poj . 
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Appendix A: TYPES AND TYPICAL SUBSPACES 

The following material can be found in most textbooks 
on information theory, e.g. [TH Il6fl . or in the original 
literature on quantum information theory 0, 0, [5jj . 

For strings of length n from a finite alphabet X, which 
we generically denote x n = x\ . . . x n € X n , we define the 
type of x n as the empirical distribution of letters in x n : 
i.e., P is the type of x n if 

Vx G C P(x) = —\{k : Xf. — x}\. 
n 

It is easy to see that the total number of types is upper 
bounded by (n + l)'^'. 

The type class of P, denoted Tp, is defined as all 
strings of length n of type P. Obviously, the type class 
is obtained by taking all permutations of an arbitrary 
string of type P. 

The following is an elementary property of the type 
class: 

(n + iy lxl exp(nH(P)) < \T]S\ < exp(nH(P)), (Al) 

with the (Shannon) entropy H(P). 

For 5 > 0, and for an arbitary probability distribution 
P, define the set of P -typical sequences as 



p.s 



1 



logP® n (x n ) -H(P) 



< 6 



By the law of large numbers, for every e > and suffi- 
ciently large n, 

P® n {T]i j5 )>l-e. (A2) 

Furthermore: 

\T£ s \<exp(n(H(P)+8)), (A3) 
|T» 5 |>(l-e)exp(n(ff(P)-5)). (A4) 
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For a (classical) channel W : X — ► y (i.e. a stochastic 
map, taking £ £ A" to a probability distribution W x on 
y) and a string x n S A"™ of type P we denote the output 
distribution of x n in n independent uses of the channel 
by 



= W X1 



W x 



Let S > 0, and define the set of conditonal W -typical 
sequences as 



\ogWUv n )-H{W\P) 



< 6 



where H(W\P) = J2 X P(x)H{W x ) is the conditional en- 
tropy. 

Once more by the law of large numbers, for every e and 
sufficiently large n, 



W^(T^ s (x n ))>l-e. 



(A5) 



Furthermore: 



\T£ i5 (x n )\ < exp(n(H(W\P)+5)), (A6) 
fe(z")| > (l~e)exp(n(H(W\P)-5)). (A7) 

All of these concepts and formulas have analogues as 
"typical projectors" II for quantum state: by virtue of 
the spectral decomposition, the eigenvalues of a density 
operator can be interpreted as a probability distribution 
over eigenstates. The subspaces spanned by the typi- 
cal eigenstates are the "typical subspaces" . The trace of 
a density operator with one of its typical projectors is 
then the probability of the corresponding set of typical 
sequences. 

Notations like II™ s etc. should be clear from this. 
There is only one such statement for density operators 
that we shall use, which is not of this form: 

Lemma 16 (Operator law of large numbers) Let 

x n e X n be of type P, and let W : X — ► S{H) be a 
cq-channel. Denote the average output state of W under 
P as 

X 

Then, for every e > and sufficiently large n, 

Tr(^n«,) >l-e. 
Proof. See [53j, Lemma 6. □ 

Appendix B: MISCELLANEOUS FACTS 

This appendix collects some standard facts about var- 
ious functionals we use: entropy, fidelity and trace norm. 



Lemma 17 (Fannes [24]) Let p and a be states on a 
d -dimensional Hilbert space, with \\p — o~\\i < S. Then 
\H(p) -H{a)\ <Slogd + T(5), with 



t(5) 



-5log5 if 6 < 1/4, 
1/2 otherwise. 



Note that t is a monotone and concave function. 



Lemma 18 ([25]) Let p and a be any two states on a 
Hilbert space. Then 

l-y/F(p,a)< ^||p-<r||i < ^Jl-F(p.a). 



□ 



Lemma 19 (Gentle measurement (Ejj l) Let p be a 

(subnormalized) density operator, i.e. p > and Trp < 
1, and letO<X < 1. Then, ifTr(pX) > 1 - X, 



VXpVx - p < V8X. 



□ 



Lemma 20 (Caratheodory's theorem [55], 1.6) 

Let vi,...,v n be points in a d-dimensional R-uector 
space, and let p(l), . . . ,p(n) be probabilities (i.e., 
non-negative and summing to 1. Then the convex 
combination 

n 

v = ^2p(i)v t 

4=1 

can be expressed as a convex combination of (at most) 
d + 1 of the Vi . 

As a consequence, there exist probability distributions 
Pj on {1, ... ,n} and probability weights Xj such that for 
all j, 

n 

v = 'S^pj(i)vi, |supp Pj\ < d + 1. 



1=1 



□ 



Appendix C: MISCELLANEOUS PROOFS 

Proof of proposition The proof follows closely the 
argument of and of 0: begin by constructing the 
typical projectors II x n of the W x n, which, for x n of type 
P, is defined as the sum of the eigenstate projectors of 
W x n with eigenvalues in the interval 

[exp(-n(H(W\P) + S));exp(-n(H(W\P) - 5))] , 

with the conditional entropy H(W\P) = 
J2 X P{x)H{W x ). For sufficiently large n, by the 
law of large numbers, Tr(W x nn x n) > 1 — e. Now define 
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where II is the typical projector of p = J2 X P(x)W x , i.e. 
the sum of the eigenstate projectors with eigenvalues in 
the interval 

[exp(-n(H(p) + 5)) ; eiq>(-n(H(p) - 5))] . 

These concepts are taken from and 0] , but see also 
appendix^] By the law of large numbers, for sufficiently 
large n, 

Tr(W x nII) > l-e 2 /8. 
From this and the gentle measurement lemma we get 



W.7 



w x 



< 2e. 



(CI) 



The strategy is now to apply the operator Chernoff bound 
to the uj x n : they are supported on a subspace of dimen- 
sion < exp(n(H(p) + 5)), and are all upper bounded by 
exp (-n(H(W\ P) - 5))U. 

The only remaining obstacle is that we need a lower 
bound on 



To this end, let II be the projector onto the sub- 
space spanned by eigenvectors of Zu with eigenvalues 
> exp(—n(H(p) — 2<5)). In this way, for sufficiently large 

n, 

Tr(ZUn) > 1 - e. 
Defining the operators 

LU x n := iriL> x nlI, 

we can now apply lemma [3] to the (rescaled) Spij), and 
get 

Pr j^E c ^ * [(i±e)fisn] j 

< 2d" exp ^-Mexp(-n(I(P; W) + 35)) 



2 In 2 



(C2) 



But 



i M 

n-.= — G [(i± c )nMi] 



j=l 



implies ||n(f2 — w)n|L < e, which in turn implies 



lnnn-0% < 2e. 



(C3) 



In particular we get, invoking eq. (|C1|) . 

Tin > Trw - 2e > 1 - 4e, 



Hence, with the gentle measurement lemma ^5] ap- 
pendix^ we obtain 



limn-n^ < V32e. 



(C4) 



Combining eqs. l|C3jl and l|C4(l via the triangle inequality 
gives 



\Q- aJ|| x < 2e 



/ 32e, 



(C5) 



and using eq. I|C1[) to replace by W™ n in both above 
operators, we get finally 



<6e + VWe < \2yfi. 



The complement of this event has probability smaller 
than eq. i|C2fl . and since 5 was arbitrary we obtain our 
claim. □ 

Proof of proposition^ In |46|. it is proved that selecting 

N' = 2(n + l) |Ar| exp(n(7(P; W) - 5)) 

codewords at random, i.i.d. according to P® n i one 
can construct a canonical decoding POVM such that for 
the expectation (over the code Chsw) of the average error 
probability, pe, goes to zero: 

<p £ )c H sw <9e + N' exp(-n(I(P;W) - 5/2)). (C6) 

(See 0], eq. (34).) The first thing we note is that (for 
sufficiently large n) e = exp(— 771) for a constant 7 > 
depending on 8: this follows by inspection of section III 
of 0], where e is introduced as the loss of probability 
mass by removing non-typical contributions. As non- 
typicality is defined as large deviation events for a sum 
of independent random variables, of the form 

log \ x n = log K k , 

k 

the Chernoff bound allows us to put exponential bounds 
on the non-typical mass. 

Hence eq. (|C6|) can be rewritten, for sufficiently large 



n, 



(Pis)cHsw < exp(-n/3), 



(C7) 



with some (3 > 0. 

We want to show that Chsw H Tp is a good approx- 
imation to a random code from the type class Tp. Of 
course, it is not quite that, if only because it has a vari- 
able number of codewords! There is an easy fix to this 
problem: define, with N = exp(n(/(P; W) — 5)), 

C := First N elements of C H sw H Tp, 

which makes sense because we can put the codewords in 
the order we select them. If the intersection is too small, 
define C to be empty. 



14 



First of all, let us bound the error probability of C: 

(C8) 



N' 



2V 

1=1 

2V' 

= -^-Pb(Chsw) 

Now, that |ChswHTf?| < 2V, happens extremely rarely: 
because P® n (Tp) > (n+l) - '^', the expected cardinality 
of the intersection is larger than 22V, for sufficiently large 
n. But then, using the Chernoff bound, 



Pr{|C HS w n T£\ < 2V} < exp -2V' 



1 



81n2(n + l)l*l 
< exp(-2V/4). 

By symmetry, it is clear that conditional on C =/= 0, the 
code C is a uniformly random code of 2V words from Tp , 
i.e. it can be described by i.i.d. and uniformly picking 
codewords. 

Hence, denoting by C a truly random code of 2V words 
from Tp , we have 



Dist(C) - Dist(C) < exp(-2V/4). 
l 



Observe that the left hand side is the total variational dis- 
tance of distributions. Thus, putting this together with 
eas- HC8|) and (|C7(1 . we obtain 

(Pe)c < (Pe)c + exp(-2V/4) 

< 2(n+l) |A,| exp(-n/3) + exp(-2V/4) 

< exp(-n/3/2), 

for sufficiently large n. But this in turn implies that 

Pr{p E (C) > exp(-n/3/4)} < exp(-n/3/4), 
by the Markov inequality. □ 



deterministic function), and that for each ut there is the 
conditional distribution P ut on X: 

P ut {x) =Pr{X = x\U = u,T = t}, 

which has the property Ylut Q( u t)Put = P- With these 
notations, 

H(B\UT) =^q(vt)S(i&), where 



Put 



and similarly for H{E\UT). 

For each t, let q(t) = ^2 U q(ut), which allows us to write 
down the conditional distribution q(-\t) on the points ut: 



q(ut'\t) 







otherwise. 



With this, the conditional distribution P t on X can be 
written 

P t = Pr{X = x\T = t}=J2 q(u\t)P u t, 

u 

for which clearly J^, q(t)Pt = P* This allows us to write 

H(B\T) = ^2q(t)S(pf), where 
t 

x 

and similarly for E. 

Now, invoking Caratheodory's theorem, lemma I^UI we 
can write, for all t, 



(C9) 



with probabilities Aju and conditional distributions qj 
such that for all j, 



J2n(ut'\t)Put' =Pt 



(CIO) 



Proof of range bounds in theorem^ Here we prove that 
we may assume that T is a deterministic function of U, 
\T\ < \X\ and \U\ < \X\ 2 : 
Observing 

I(U; B\T) - I(U; E\T) = H(B\T) ~ H{B\UT) 

- [H{E\T) - H{E\UT)}, 

we aim at writing the four conditional entropies on the 
right as averages over similar such quantities but with 
limited range of U and T. To this end, observe that the 
channels Q and R induce a probability distribution q(ut) 
on the values ut of U := UT (of which T clearly is a 



and |supp <7j(-|t)| < \X\. Another application of 
Caratheodory's theorem gives a convex decomposition 



q = ^2pkqk, 

k 



(Cll) 



such that the support of all the qk has cardinality < \X\ 
and for all k, 



Y J qkit)p t = p 



(C12) 



Eqs. (|C9p and i|Cll|) define random variables J and K, 
respectively: by eqs. I|C10|) and (|C12(1 . for each value 
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JK = jk the conditional distribution of T and U define 
variables Tj k \Uj k \X, and so 

H(B\T) - H(E\T) = H(B\TJ) - H(E\TJ) 

= £ MJK = jk} [H(B\T jk ) - H(E\Tjk)] . 
jk 

In the same way, 

-H(B\UT) + H(E\UT) = -H(B\UJK) + H(E\UJK) 
= MJK = Jk} [-H(B\U jk ) + H(E\U jk )] . 

Hence there exist j and k such that 

I[U; B\T) - I(U; E\T) < I{U jk ; B\T jk ) - I(U jk ; E\T jk ), 

and Ujk and Tj k satisfy the range constraints. □ 

Proof of range bounds in theorem^ Here we prove that 
we may assume that T is a deterministic function of X, 
\T\ < d\ and \X\ < d\: 

Denote the POVM elements of the measurement pro- 
ducing x and t by P x t, and introduce the coarse-grained 
operators P t = J2 X ^ xt - ^° decompose the POVM using 
convexity arguments, we rewrite the completeness condi- 
tions as 



7T( 



E 



d A TiP t 

Tr P xt P x t 
Tr P t TrP xt 



Kxt 1 - 



Using Caratheodory's theorem, lemma [201 we can write 



E 



Aifc<?fc, 



(C13) 



with distributions q k of support < d\ and such that for 
all fc, 



(C14) 



Using Caratheodory's theorem once more, we obtain, for 
each t, a decomposition 



?(■!*) = E A ^ (•!*)' 



(C15) 



with conditional distributions q 3 (-\t) of support < d\ and 
such that for all j, 



Yqj(xl/\t)TT x t' = 



7T t . 



(C16) 



Now, let X := XT, which T clearly is a function of. 
Then, eqs. 1C13|) and l|C15l) define random variables J 
and K, respectively: by eqs. I|C14J) and l|C16p . for each 
value JK = jk we have a POVM P^ k ) (whose output 
variable we denote Xj k the function T of which we denote 
Tjk). Then (compare the previous proof), 



H(B\T) - H(E\T) = H{B\TJ) - H(E\TJ) 

= J2MJK = jk}[H(B\T jk ) - H(E\T jk )] . 
jk 

-H(B\XT) + H{E\XT) = -H{B\XJK) + H{E\XJK) 
= Pr { JK =M [~H(B\X jk ) + H{E\X jk )]. 

jk 

Hence there exist j and k such that 

/(V; B\T) - I{X; E\T) < I(X jk ; B\T jk ) - I(X jk ; E\T jk ), 

and Xj k and Tj k satisfy the range constraints. □ 
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